Supervised Categorization of JavaScript TM using Program Analysis Features
نویسندگان
چکیده
Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purposes can often help to interpret or provide crucial information about the web page. I have developed a functionality-based categorization of JavaScript, the most widely used web page scripting language. I then view understanding embedded scripts as a text categorization problem. I show how traditional information retrieval methods can be augmented with the features distilled from the domain knowledge of JavaScript and program analysis to improve classification performance. I perform experiments on the standard WT10G web page corpus, and show that my techniques eliminate over 50% of errors over a standard text classification baseline. Subject Descriptors: H.3.3 Information Search and Retrieval F.3.2 Semantics of Programming Languages D.2.8 Metrics
منابع مشابه
Supervised Categorization of JavaScriptTM Using Program Analysis Features
Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purpose can often help to interpret or provide crucial information about the web page. We have developed a functionality-based categorization of JavaScript, the most widely used web page scripting language. We then view understanding embedded scripts a...
متن کاملDetection and Analysis of Shellcode in Malicious Documents
A Shellcode is a code snippet used as a payload in exploiting software vulnerability. In recent trends of attack, shellcode embedded in documents are one of the widely used vectors for targeted attacks. The significant aspect of these documents are dynamic content, URL access and can be camouflaged easily. Most of the security mechanisms are not accoutered to deal with these weaponised document...
متن کاملThe Reliability of Metrics Based on Graded Relevance
Improving weak ad-hoc retrieval by Web assistance and data fusion p. 17 Query expansion with the minimum relevance judgments p. 31 Improved concurrency control technique with lock-free querying for multi-dimensional index structure p. 43 A color-based image retrieval method using color distribution and common bitmap p. 56 A probabilistic model for music recommendation considering audio features...
متن کاملText Categorization using the Semi-Supervised Fuzzy c-Means Algorithm
Text Categorization (TC) is the automated assignment of text documents to predefined categories based on document contents. For the past few years, TC has become very important essentially in the Information Retrieval area, where information needs have tremendously increased with the rapid growth of textual information sources such as the Internet. In this paper, we compare , for text categoriz...
متن کاملA Practical Blended Analysis for Dynamic Features in JavaScript
JavaScript is widely used in Web applications; however, its dynamism renders static analysis ineffective. Our JavaScript Blended Analysis Framework is designed to handle JavaScript dynamic features. It performs a flexible combined static/dynamic analysis. The blended analysis focuses static analysis on a dynamic calling structure collected at runtime in a lightweight manner, and refines the sta...
متن کامل